Indexing in EMu

EMu has a number of indexing methods for efficient and timely access to data. An indexing method is an algorithm or set of rules to search data in an indirect way. The simplest type of indexing, known as an exhaustive search, is no index at all. In this case, each record is read sequentially and compared against the search terms entered. If there is a match, the record is added to the set of matching records, then the next record is read. The exhaustive search method is very space efficient as only the data needs to be stored. However, a search may take some time to complete if there is a very large number of records (perhaps several hours for 750,000 records).

To facilitate the search of large numbers of records, indexes are built that provide rapid access to data that match given search criteria. Indexes provide an indirect means of searching data in a judicious manner: when a search term is entered, the indexes are consulted to produce the matching records. There is a cost associated with indexing: the need to store indexing information along with the data.

There are a large number of indexing methods available to designers of databases, each one with associated pros and cons. The EMu database engine employs two flexible indexing methods to provide rapid retrieval of data from large numbers of records:

  • The first is known as linear hashing and provides high speed key retrieval.
  • The second goes by the long name of A two level superimposed coding scheme for partial match retrieval, (shortened to the Two Level method) and provides a general purpose framework for implementing a wide range of term based searches. A term is simply a sequence of characters that forms the basic entity for searching. For example, in word based searching (where you need only enter a word to find matching records), a term is a word.

A number of pre-configured indexing options are distributed with EMu. In particular a number of fields that contain name based data already have phonetic based indexing enabled. Also many descriptive fields (e.g. Notes) have stem based indexing set.

It is possible to adjust indexing via Registry entries. These entries allow institutions to tune indexing methods to provide the most efficient searching possible without wasting disk space on unused methods.

Users in group Admin can use an Admin Task to view indexing information.

EMu 4.0.01

EMu 3.1 saw the addition of new indexing methods to the database engine. In particular support for NULL indexing (whether a field is empty or not) and PARTIAL indexing (fast searching for leading characters, e.g. a*) was added. Tools were provided that allowed System Administrators to configure, via the EMu Registry, which fields required the new indexing methods. The new indexing facilities did not provide a mechanism for adjusting or tuning range indexes.

EMu 4.0.01 saw the addition of new tools that permit System Administrators to tune the range indexing used by EMu. Support for automatic optimisation of range indexes has also been added. Using these tools EMu can now provide optimal range indexes with significantly faster range based searches.